Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "data/aws: Switch to m4.large" #858

Merged
merged 1 commit into from
Dec 11, 2018

Conversation

wking
Copy link
Member

@wking wking commented Dec 10, 2018

This reverts commit a230376 (#765), now that we've bumped our t3.medium limits in the CI and dev accounts to cover our expected loads. Moving from m4.large to t3.medium also reduces memory from 8 GiB to 4 GiB, but after a recent run of end-to-end tests, the master consumption was:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7980        2263         794           8        4922        5259
Swap:             0           0           0

so 4 GiB should be sufficient. And it also matches our libvirt setup since e59513f (#785).

CC @abhinavdahiya, @crawford, @smarterclayton, @cuppett

This reverts commit a230376
(data/aws: Switch to m4.large, 2018-11-29, openshift#765), now that we've
bumped our t3.medium limits in the CI and dev accounts to cover our
expected loads.  Moving from m4.large to t3.medium also reduces memory
from 8 GiB [1] to 4 GiB [2], but after a recent run of end-to-end
tests, the master consumption was:

  $ free -m
                total        used        free      shared  buff/cache   available
  Mem:           7980        2263         794           8        4922        5259
  Swap:             0           0           0

so 4 GiB should be sufficient.  And it also matches our libvirt setup
since e59513f (libvirt: Add Terraform variables for memory/CPU, bump
master to 4GiB, 2018-12-04, openshift#785).

[1]: https://aws.amazon.com/ec2/instance-types/#General_Purpose
[2]: https://aws.amazon.com/ec2/instance-types/t3/#Product_Details
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 10, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2018
@crawford
Copy link
Contributor

Can we run this several times to make sure that it reliably succeeds? (Requires reliable CI)

@wking
Copy link
Member Author

wking commented Dec 10, 2018

Can we run this several times to make sure that it reliably succeeds? (Requires reliable CI)

Yup, no rush. And I think @smarterclayton was considering e2e reruns to test reliability across the board, although I agree that this change is especially likely to be flake-inducing. Once CI is reliably green for other PRs, I'll /test e2e-aws here a few times.

@cgwalters
Copy link
Member

cgwalters commented Dec 10, 2018

Unless there's a specific reason why not to, let's keep AWS and libvirt in sync in this aspect in the future. So maybe add a comment to each like # If changing the master memory, also change libvirt (and vice versa) or even just use a common TF variable?

@wking
Copy link
Member Author

wking commented Dec 11, 2018

So maybe add a comment to each like # If changing the master memory, also change libvirt (and vice versa)...

I'm fine with or without this. Ideally, all platforms are under CI (while we currently have e2e tests for libvirt, they always fail because the cluster lacks a publicly routeable API), and that's good enough for me. Keeping the platforms similar is nice, but especially with libvirt, the goal is really different (short-term dev clusters on libvirt vs. the possibility of long-term, eventually production clusters on AWS and other providers). So I don't mind if our default provisioning diverges a bit either.

... or even just use a common TF variable?

I don't see a way to do this, because for a given CPU and memory target, there are quite a few AWS instance types which match, and they all have different minor properties (burstable or not, etc.) which are not captures in the libvirt parameters. I'm very committed to having a shared installer core targeting Kubernetes abstractions that is platform-agnostic, but once we get out into the cluster API and installer-provisioned infrastructure, I'm happy to let the individual providers go their own way.

In other news, here's one successful e2e-aws. Let's try for another:

/test e2e-aws

@wking
Copy link
Member Author

wking commented Dec 11, 2018

And another successful e2e-aws. Checking again:

/test e2e-aws

@wking
Copy link
Member Author

wking commented Dec 11, 2018

And another successful e2e-aws. @crawford, how many of these did you want? ;)

@crawford
Copy link
Contributor

I don't think I've ever seen three in a row...

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 11, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@crawford
Copy link
Contributor

4/5 is still pretty good. I knew I jinxed it with that last comment.

@abhinavdahiya
Copy link
Contributor

Is there a way we can watch the cpu / memory utilization during the test run? @wking

@openshift-merge-robot openshift-merge-robot merged commit 6adb78a into openshift:master Dec 11, 2018
@wking wking deleted the back-to-t3.medium branch December 11, 2018 18:56
@wking
Copy link
Member Author

wking commented Dec 11, 2018

Is there a way we can watch the cpu / memory utilization during the test run?

CloudWatch has graphs; I forget if they charge to show memory or not. I just ran a loop to ssh into the masters in turn and run free -m. It seemed stable, but it's obviously not very fine-grained. And CI doesn't run for that long.

@cuppett
Copy link
Member

cuppett commented Dec 11, 2018

CloudWatch shows things external to the VM like CPU and IOPS. To see memory or disk usage by filesystems within the VM in CloudWatch, most run their agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants